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ABSTRACT 

Most previously proposed dual-channel coherent-to-diffuse- 
ratio (CDR) estimators are based on a free-held model. When 
used for binaural signals, e.g., for dereverberation in binaural 
hearing aids, their performance may degrade due to the in- 
huence of the head, even when the direction-of-arrival of the 
desired speaker is exactly known. In this paper, the head shad¬ 
owing effect is taken into account for CDR estimation by us¬ 
ing a simplihed model for the frequency-dependent interaural 
time difference and a model for the binaural coherence of the 
diffuse noise held. Evaluation of CDR-based dereverberation 
with measured binaural impulse responses indicates that the 
proposed binaural CDR estimators can improve PESQ scores. 

Index Terms — Binaural speech dereverberation, interau¬ 
ral time difference, coherent-to-diffuse-ratio 


1. INTRODUCTION 

Both speech quality and speech intelligibility may dramati¬ 
cally degrade in reverberant and noisy environments. Many 
different algorithms were proposed to suppress noise and the 
reverberation during the past decades (see w and refer¬ 
ences therein). This paper focuses on binaural speech dere¬ 
verberation, where the binaural signals are recorded with two 
microphones located at two human ears. 

Previous studies have already shown that it is important 
to preserve both the interaural time difference (ITD) and the 
interaural level difference (ILD) cues when applying binaural 
dereverberation methods for hearing aids ||4||^, since, when 
binaural cues are distorted, localization of sound sources be¬ 
comes difficult |Tg. This condition is ensured by a two- 
channel postfiltering approach where the same gain is applied 
to both channels Q. In Q, Jeub et al. took the shadowing ef¬ 
fect of the head into account in the diffuse sound field model. 
In interaural coherence histograms were mapped to a 

gain function to suppress the reverberant components in each 
frequency channel. 


Recently, coherent-to-diffuse-ratio (CDR) estimators 
have been proposed, which can be seen as an alternative 
formulation of coherence-based dereverberation approaches 
GD- In 1^, the assumption was made that binaural signals 
are time-aligned before calculating the spectral weights of the 
Wiener filter. In GD. two CDR estimators were proposed, 
where one requires knowledge on both the direction of arrival 
(DOA) of the desired speaker and the spatial coherence of the 
late reverberant speech, and the other does not need the DOA 
information. In fT^ , Schwarz and Kellermann proposed im¬ 
proved estimators both for the case of known and unknown 
DOA, which were shown to lead to improved dereverberation 
performance (see 112 Table III] for details). To the best of 
our knowledge, these CDR estimators have not been applied 
to binaural dereverberation and their performance has not 
been reported until now. 

After briefly reviewing CDR estimators for free-fleld con¬ 
ditions, i.e., for a sound field with no obstructions close to the 
microphones, we describe models for the ITD and the coher¬ 
ence of diffuse noise under the influence of the head in a bin¬ 
aural scenario, and show that the direction-dependent CDR 
estimators based on a free-fleld assumption are not robust un¬ 
der this model. We propose to modify the CDR estimators 
to use binaural models. Experimental results confirm that 
the proposed estimators achieve higher PESQ scores than the 
free-fleld estimators when applied to coherence-based dere¬ 
verberation. The proposed binaural CDR estimators have nu¬ 
merous applications, such as binaural hearing aids, robotics, 
or immersive audio communication systems. 


2. FREE-FIELD SIGNAL MODEL AND CDR 
ESTIMATION 

We model two reverberant and noisy microphone signals 
Xi{t), i = 1,2, as the sum of a desired speech component 
Xi,cohit) and an undesired component Xi^disit) consisting of 
diffuse reverberation and/or noise; 

Xiit) — Xi^cohit) -f (1) 
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As in previous studies, we assume both microphones to be 
omnidirectional and the desired component to be a plane wave 




in the free (locally unobstructed) field, so that a; 2 ,coh(i) is a 
time-shifted version of a;i_coh(i) Mill- 


^2,coh(i') — ^l,coh(^ ”^12)5 


( 2 ) 


where T 12 is the time difference of arrival (TDOA) of the de¬ 
sired sound between the first and the second microphone. The 
free-held model for the spatial coherence between the de¬ 
sired speech component at both microphones, a:i coh(i) and 
a; 2 ,coh(f), is given by 

r™h(/) = exp(jri2). (3) 

If 0 = 0° corresponds to broadside direction, the TDOA in 
the free held can be expressed as 


Table 1: Summary of CDR estimators evaluated in this paper. Tcoh 
and Fdiff indicate the model coherence functions used for desired 
signal and diffuse noise, respectively. Fa; indicates the estimated co¬ 
herence of the mixed sound held. K{»} extracts the real part of a 
complex value and * denotes the complex conjugate. 


Estimator 

Direction-dependent 


^Schwarz 1 



^Schwarz 2 

|rc*oh (rdiff-f.)/(lR, 

[rc*ohr4-i)l 

Estimator 

Direction-independent 


^Thiergart2 

(Fdiff-f,)/(f,- 

exp 

^Schwarz 3 

[ 1 ^ (25)] 



ri 2 = d sin 6*/c, 


(4) 


3. BINAURAL SIGNAL MODEL 


where d is the distance of the two microphones and c is the 
speed of sound. 

The spatial coherence between the reverberation/noise 
components (f) and a; 2 ,diff (f) is given by the spatial co¬ 
herence function of two omnidirectional sensors in a diffuse 
(spherically isotropic), locally unobstructed sound held: 


t^FF 
^ diff 


(/) 


sin (27r/d/c) 
2tt fdjc 


(5) 


where / is the frequency in Hz. For the cylindrically isotropic 
held, the spatial coherence can be given by 


r^g-iso (/) = Jo (2^/d/c). (6) 


Generally, 0 often hts better than ^ in practical applica¬ 
tions 1121, therefore we use F^j^ (/) in this paper, although 
J'^E-isoU) applied analogously. 

The CDR at the f-th microphone can be given by 


CDR^ikJ) 


^i,coh{^: /) 

4>i,diff(fc, /) ’ 


(7) 


where <i)i_coh (k, f) and 4*^ dig {k, /) are the short-time power 
spectra of Xi^coh {t) and (t), respectively, with the 

frame index k and frequency / (we will omit both k and / in 
the following for brevity). We further assume that the power 
spectra are identical at the two microphones for both the de¬ 
sired and undesired component, i.e., $coh = ‘J?! coh = 4*2,coh 
and T>diff = 4*1,diff = 4*2,diff, and therefore 

CDR = CDRi = CDR 2 = 1 ^. ( 8 ) 

4>diff 


Using the models for the coherence of the desired and dif¬ 
fuse signal components given above, and a short-time esti¬ 
mate of the coherence between xi{t) and X 2 {t), which is in 
the following denoted as Tx^k, /) and which may be obtained 
by recursive averaging, it is possible to estimate the time- and 
frequency-dependent CDR, as described in detail in fT^ . The 
CDR estimators which are evaluated in this paper are summa¬ 
rized in Table [T] 


When the two microphones are placed at the two ears, the 
ITD is the propagation delay of the desired sound from the 
left ear to the right ear and the ILD measures the power level 
difference between the two microphones. Both the ITD and 
the ILD have already been widely studied, and various models 
can be found in |[T3][T4) and references therein. As in 
the impact of the ILD is neglected in the following, i.e., we 
maintain the assumption of equal power at both microphones. 
Based on this assumption, both the CDRs and the postfilter 
gain functions are the same at the two microphones placed at 
the two ears. 

In this section, we first describe a simplified model for 
the frequency-dependent ITD and use it to derive a coher¬ 
ence model for the desired signal component. Then, we de¬ 
scribe appropriate models for the diffuse sound field coher¬ 
ence which account for the effect of the head. Finally, we 
describe the application of these models for binaural CDR es¬ 
timation and compare the robustness of CDR estimators based 
on the free-held model to the binaural CDR estimators. 


3.1. ITD and Desired Signal Coherence Model 

Previous studies have shown that, unlike the TDOA in the 
free-held case given by 0, the ITD is highly dependent on 
the frequency, the azimuth angle, the elevation angle and the 
distance of the desired speaker from the head | [T4] - [T^ . Here, 
we use a simplihed ITD model to make it applicable for prac¬ 
tical application to binaural dereverberation. We assume that 
the distance of the desired speaker from the head is larger 
than Im, and thus does not have a signihcant effect on the 
ITD fTS] Fig. 9]. Furthermore, we neglect separate consid¬ 
eration of elevation and azimuth angles, and instead model 
the ITD as a function of the angle 9, which we define as the 
angle between the direction of the desired speaker and the for¬ 
ward median plane of the head. According to the head-related 
spherical coordinate system HD Fig. 7], 0 = 0 and 6 — ±tt 
correspond to the forward and the backward median planes of 
the head, respectively. 
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Fig. 1: Comparison of ri 2 and Tir{f) versus the angle of the desired 
sound for different frequencies. 


Kuhn has shown that the ITD is frequency-dependent 


1161, which can be approximately summarized as 
r,t= l-^‘'“"(‘') .for/< 500Hz, 


(9) 


and, for f > Jh = 2000, 




0.5d (sin (0) -f 9)/c, 0 € [—7r/2,7r/2], 

0.5d (sin (0) — {n + 9))/c, 9 G [—tt, —7r/2], 
0.5d (sin (9) + (tt — 9))lc, 9 € [7r/2, tt], 

( 10 ) 


where and ( [T0| ) are identical to fT^ (7) and (12)], respec¬ 
tively. However, for the middle frequency range, there is not 
an explicit expression. We propose to use a linear interpola¬ 
tion to model the ITD in the middle frequency range, which 
agrees well with the measurement results m and is given by 


.^Mid 


if) = T-tr + 


'''ir '^Ir 

fn - Jl 


if - h ), 


( 11 ) 


where / = /mm G [500 2000]Hz. 

Compared to ti 2 , the ITD Tirif) is not only a function of 
the DOA but also of the frequency. ri 2 and nrif) versus 9 
are plotted in Fig. [T]for different frequencies. Fig. [T] shows 
that the difference between |T/r(/)| and |ri 2 | is largest for 
/ < 500Hz. For / > 2000Hz, Ti^if) is close to ti 2 when |0| 
or [tt ± 9 \ is smaller than 7r/4, while |T7r(/)| is much larger 
than |ti 2 | for |6*| close to 7r/2. 

Without the shadowing effect of the head, the free-held 
coherence model of the desired signal is given by Based 
on the frequency-dependent ITD model which accounts for 
the head effect, we can now dehne the coherence of the de¬ 
sired component for the binaural case as: 

r®“(/)=exp(jri.(/)). (12) 


3.2. Diffuse Noise Coherence Model 

The shadowing effect of the head also has an impact on the 
spatial coherence of the two microphone signals in a diffuse 
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Fig. 2: CDR estimation error of the free-held estimator. A, versus 
/ and 9 for (a) the input CDR rjin = — 20dB, (b) the input CDR 
rjin = 20dB. 


sound held. Both theoretical results and experimental results 
can be found in dZlE!)- Here we use the analytic represen¬ 
tation of the binaural correlation function proposed by Linde- 
vald and Benade HI)’ given by 


T~tBinaural 
^ diff 


if) 


_ I _sin {a2TTfd/c) 

^l + {/32nfd/cf i^^^fd/c) ’ 


where a = 2.2 and (3 = 0.5. 

The binaural CDR estimators are now obtained by insert¬ 
ing the binaural coherence models and 

into the estimators given in Table [T] This extension makes 
the direction-dependent CDR estimators suitable for binaural 
dereverberation. The corresponding estimators are denoted as 
-Binaural jjj following, where • represents the name of the 
technique that is being used. 

3.3. Robustness of the Free-Field Estimators in the Bin¬ 
aural Scenario 

This part evaluates the robustness of the direction-dependent 
CDR estimators using the free-held model against the shad¬ 
owing effect of the head. For the limited space of this paper, 
only 7 schwa"rt 2 jichosen to compare with p|’ehwarz 2 i since a 
previous study 112| has already shown that ? 7 sXwarz 2 the 
best performance among the direction-dependent CDR esti¬ 
mators in Table[T](see | fT^ Table III] for details). For the com¬ 
parison, we generate values of the mixture coherence Fa, for 
a certain input CDR pin and different angles and frequencies 
according to the binaural coherence models dehned above, 
and insert these coherence values into the free-held estima¬ 
tor. We then dehne the estimation error of the free-held CDR 
































Table 2: PESQ scores averaged over all angles for CDR estimators in Table|^ using free-field or binaural coherence models 


AIR 

Unprocessed 

Direction-dependent 

Direction-independent 

Distance 

Left/Right 

~FF 

^Schwarzl 

^Binaural 
^'Schwarz 1 

~FF 

^Schwarz2 

^Binaural 

*'Schwarz2 

~FF 

^Thiers;art2 

.ciBinaural 

^'Thierg:art2 

~FF 

^SchwarzS 

,^Binaural 

^'SchwarzS 

Im 

2.24/2.25 

2.40/2.41 

2.65/2.68 

2.57/2.59 

2.69/2.71 

2.66/2.67 

2.64/2.65 

2.65/2.67 

2.64/2.65 

2m 

1.88/1.90 

2.00/2.00 

2.12/2.13 

2.10/2.10 

2.17/2.18 

2.16/2.17 

2.15/2.15 

2.16/2.17 

2.15/2.16 

3 m 

1.77/1.77 

1.85/1.84 

1.91/1.90 

1.92/1.91 

1.97/1.96 

1.95/1.95 

1.95/1.95 

1.96/1.96 

1.95/1.95 


estimator compared to the true CDR r]in as 

A = 10 logio Chwarz2 “ 10 logio Vin- (14) 

Fig. 1^ plots A versus / and 9 for the true input CDR 
rjin = -20dB (a) and r?i„ = 20dB (b). Only 0 e [0 7r/2] is 
considered due to the symmetry of the scenario. Fig. j^shows 
that the CDR is somewhat overestimated for low input CDR, 
while for high input CDR, the CDR is seriously underesti¬ 
mated for angles larger than 45°. The influence of the head on 
the coherence, especially the one of the desired speech com¬ 
ponent is significant enough to deteriorate the 

performance of the free-held CDR estimator considerably. 


4. EVALUATION 


This section evaluates the application of the CDR estimators 
in Table [T] with the free-held and binaural coherence models 
to the problem of dereverberation. We use the Aachen Im¬ 
pulse Response (AIR) database | [T9) , which consists of bin¬ 
aural RIRs measured by a dummy head with azimuth angles 
from -90° to 90° with 15° increments and source-head dis¬ 
tances from 1 m to 3 m with 1 m increments. 

Ten clean speech samples (five female and five male 
speakers) are taken from the TIMIT database p0| . The rever¬ 
berant speech samples are generated by convolving the clean 
speech with the “stairway” RIRs from the AIR database. 
We use the same filterbank, postfllter gain function and pa¬ 
rameters as in fT^ (29)], with the CDR estimators in Table 
Knowledge of the true DOA is assumed for computation 
of the desired signal coherence models. The gain function 
is applied to the two microphone signals separately. PESQ 
is chosen as evaluation measure since it was found to be 
highly correlated with speech quality for the evaluation of 
noise and reverberation suppression methods |21 Flere, 
we give raw MOS scores obtained by wideband PESQ. The 
PESQ scores of the two microphone signals and those of the 
processed signals are given separately. Note that the average 
PESQ scores for both ears are very similar, due to the symme¬ 
try of the scenario. The experimental results for the different 
distances are presented in Table|^ Erom these results, we can 
make the following observations: 

(1) Using the ITD and binaural diffuse coherence model can 
improve all of the direction-dependent CDR estimators. 

(2) The direction-independent CDR estimators, which do not 
rely on a model of the desired signal coherence, are robust 




Fig. 3: PESQ scores versus DOA for the 1 m distance case: 
(a) direction-dependent CDR estimators; (b) direction-independent 
CDR estimators. Left represents the unprocessed signal recorded by 
the microphone located at the left ear. 


in the binaural case, even when using the free-held diffuse 
coherence model. This indicates that the choice of the 
diffuse coherence model is not critical, and the main effect 
of the head is on the ITD. 

(3) The direction-independent estimators almost reach the 
performance of the best binaural direction-dependent es¬ 
timator. 

To reveal the mechanism of the better performance of the 
proposed direction-dependent binaural CDR estimators, the 
PESQ scores versus 9 are plotted in Eig. |^(due to symmetry, 
only PESQ scores for the left microphone are shown). As 
can be seen from this figure, ? 7 gchwarz 2 much better than 
0Sd3warz2 fof |^| > 45°. This phenomenon can be explained 
by the robustness analysis results in Eig.|^ where it was found 
that the estimation etTor of i7s(^warz2 becomes significant for 
|0| > 45°. However, ?7schwarz3 ^nd iyschw^rta "early have 
the same performance for all angles, which confirms that the 
effect of using F^j^ (/) or (/) is not critical for the 

direction-independent CDR estimators. 

The estimators ? 7 Thiergart 2 and iyschwarzs show similar be¬ 
havior in this scenario, although the former is biased GD- 
This can be explained by the fact that the bias is roughly pro¬ 
portional to the noise coherence and disappears for Fdiff —0; 
since, for binaural signals, the noise coherence is lower than 
for the setup investigated in due to the large spacing of 
the sensors and the shadowing effect of the head, the practical 
impact of the bias is not significant here. 






























5. CONCLUSIONS 

This paper extends previously proposed free-held CDR es¬ 
timators to binaural dereverberation by using a simplihed 
model for the ITD. Experimental results show that this exten¬ 
sion is important for the direction-dependent CDR estimators, 
where PESQ scores for dereverberation can be signihcantly 
improved. It is further shown that the direction-independent 
CDR estimators, which do not require a model of the desired 
signal coherence, can achieve similar performance and are 
robust towards the shadowing effect of the head. Eurther 
work could concentrate on studying the impact of the ILD 
on binaural dereverberation and the theoretical limits of the 
CDR estimators by using statistical analysis 1^ . 

REFERENCES 

[1] M. Brandstein, and D. Ward, Microphone arrays: sig¬ 
nal processing techniques and applications. Berlin: 
Springer-Verlag, 2001. 

[2] J. Benesty, S. Makino, and J. Chen, Speech Enhance¬ 
ment. Berlin: Springer-Verlag, 2005. 

[3] P. A. Naylor, and N. D. Gaubitch, Speech dereverbera¬ 
tion. London: Springer-Verlag, 2010. 

[4] J. B. Allen, D. A. Berkley, and J. Blauert. “Multimicro¬ 
phone signal-processing technique to remove room re¬ 
verberation from speech signals.” J. Acoust. Soc. Am., 
vol. 62, pp. 912-915, 1977. 

[5] K. Lebart, J. Boucher, and P. Denbigh. “A binaural sys¬ 
tem for the suppression of late reverberation.” in Proc. 
EUSIPCO, Island of Rhodes, Greece, 1998. 

[6] M. Jeub, M. Schafer, T. Esch, and P. Vary. “Model-based 
dereverberation preserving binaural cues.” IEEE Trans. 
Audio, Speech, and Lang. Process., vol. 18, pp. 1732- 
1745, 2010. 

[7] A. Kuklasinski, S. Dodo, S. H. Jensen, and J. Jensen. 
’Maximum likelihood based multi-channel isotropic re¬ 
verberation reduction for hearing aids.” in Proc. EU¬ 
SIPCO, Lisbon, Portugal, 2014. 

[8] A. Westermann, J. M. Buchholz, and T. Dau. “Binau¬ 
ral dereverberation based on interaural coherence his¬ 
tograms.” J. Acoust. Soc. Am., vol. 133, pp. 2767-2777, 
2013. 

[9] A. Tsilhdis, A.Westermann, J. M. Buchholz, E. Geor- 
ganti and J. Mourjopoulos. Binaural Dereverberation. 
Berlin: Springer-Verlag, 2013. 

[10] V. Hamacher, J. Chalupper, J. Eggers, E. Eischer, U. 
Kornagel, H. Puder, U. Rass. “Signal Processing in High- 
End Hearing Aids: State of the Art, Challenges, and Eu- 
ture Trends.” EURASIP J. on Adv. in Signal Process., vol. 
18, pp. 2915-2929, 2005. 


[11] O. Thiergart, G. Del Galdo, and E. A. P. Habets. 
“Signal-to-reverberant ratio estimation based on the 
complex spatial coherence between omnidirectional mi¬ 
crophones.” in Proc. ICASSP, Kyoto, Japan, 2012. 

[12] A. Schwarz, and W. Kellermann. “Coherent-to-diffuse 
power ratio estimation for dereverberation.” lEEE/ACM 
Trans, on Audio, Speech and Lang. Process., vol. 23, pp. 
1006-1018, 2015. 

[13] J. Blauert. Spatial Hearing. The MIT Press: Harvard 
MA, 1997. 

[14] J. Blauert. The Technology of Binaural Listening. 
Berlin-Heidelberg-New York: Springer-Verlag, 2013. 

[15] T. Qu, Z. Xiao, M. Gong, Y. Huang, X. Li, and X. 
Wu. ’’Distance-dependent head-related transfer functions 
measured with high spatial resolution using a spark gap.” 
IEEE Trans, on Audio, Speech, and Lang. Process., vol. 
17, no. 6,pp. 1124-1132, Aug. 2009. 

[16] G. E. Kuhn. “Model for the interaural time differences 
in the azimuthal plane.” J. Acoust. Soc. Am., vol. 62, pp. 
157-167, 1977. 

[17] I. M. Lindevald and A. H. Benade. “Two-ear correlation 
in the statistical sound fields of rooms.” J. Acoust. Soc. 
Am., vol. 80, pp. 661-664, 1986. 

[18] M. Jeub, M. Dorbecker and P. Vary. “A semi-analytical 
model for the binaural coherence of noise fields.” IEEE 
Signal Process. Letters, vol. 18, pp. 197-200, 2011. 

[19] M. Jeub, M. Schafer, and P. Vary. “A binaural room im¬ 
pulse response database for the evaluation of dereverber¬ 
ation algorithms.” in Proc. Int. Conf. Digital Signal Pro¬ 
cess. (DSP), Santorini, Greece, 2009. 

[20] J. S. Garofolo. “Getting Started With the DARPA 
TIMIT CD-ROM: An Acoustic-Phonetic Continous 
Speech Database.” Nat. Inst, of Standards and Technol¬ 
ogy (NIST), Gaithersburg, MD, 1993. 

[21] Y. Hu and P. C. Loizou. “Evaluation of objective quality 
measures for speech enhancement.” IEEE Trans. Audio, 
Speech and Lang. Process., vol. 16, pp. 229-238, 2008. 

[22] S. Goetze, A. Warzybok, I. Kodrasi, J. O. Jungmann, 
B. Cauchi, J. Rennies, E. A. P. Habets, A. Mertins, T. 
Gerkmann, S. Dodo, and B. Kollmeier. “A Study on 
Speech Quality and Speech Intelligibility Measures for 
Quality Assessment of Single-Channel Dereverberation 
Algorithms,.” in Proc. IWAENC, Antibes, Prance, 2014. 

[23] C. Zheng, H. Liu, R. Peng,and X. Li. “A Statisti¬ 
cal Analysis of Two-Channel Post-Pilter Estimators in 
Isotropic Noise Pields.” IEEE Trans, on Audio, Speech, 
and Lang. Process., vol. 21, pp. 336-342, 2013. 


